Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 42(Database issue): D865-72, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24217909

RESUMO

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.


Assuntos
Bases de Dados Genéticas , Proteínas/genética , Animais , Éxons , Genômica , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Análise de Sequência
2.
Nucleic Acids Res ; 42(Database issue): D771-9, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24316575

RESUMO

The Vertebrate Genome Annotation (VEGA) database (http://vega.sanger.ac.uk), initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).


Assuntos
Bases de Dados Genéticas , Genoma , Anotação de Sequência Molecular , Animais , Genoma Humano , Genômica , Humanos , Internet , Camundongos , Camundongos Endogâmicos NOD , Camundongos Knockout , Ratos , Suínos/genética , Peixe-Zebra/genética
3.
Database (Oxford) ; 2013: bat032, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23729657

RESUMO

Model organisms are becoming increasingly important for the study of complex diseases such as type 1 diabetes (T1D). The non-obese diabetic (NOD) mouse is an experimental model for T1D having been bred to develop the disease spontaneously in a process that is similar to humans. Genetic analysis of the NOD mouse has identified around 50 disease loci, which have the nomenclature Idd for insulin-dependent diabetes, distributed across at least 11 different chromosomes. In total, 21 Idd regions across 6 chromosomes, that are major contributors to T1D susceptibility or resistance, were selected for finished sequencing and annotation at the Wellcome Trust Sanger Institute. Here we describe the generation of 40.4 mega base-pairs of finished sequence from 289 bacterial artificial chromosomes for the NOD mouse. Manual annotation has identified 738 genes in the diabetes sensitive NOD mouse and 765 genes in homologous regions of the diabetes resistant C57BL/6J reference mouse across 19 candidate Idd regions. This has allowed us to call variation consequences between homologous exonic sequences for all annotated regions in the two mouse strains. We demonstrate the importance of this resource further by illustrating the technical difficulties that regions of inter-strain structural variation between the NOD mouse and the C57BL/6J reference mouse can cause for current next generation sequencing and assembly techniques. Furthermore, we have established that the variation rate in the Idd regions is 2.3 times higher than the mean found for the whole genome assembly for the NOD/ShiLtJ genome, which we suggest reflects the fact that positive selection for functional variation in immune genes is beneficial in regard to host defence. In summary, we provide an important resource, which aids the analysis of potential causative genes involved in T1D susceptibility. Database URLs: http://www.sanger.ac.uk/resources/mouse/nod/; http://vega-previous.sanger.ac.uk/info/data/mouse_regions.html#Idd


Assuntos
Diabetes Mellitus Tipo 1/genética , Variação Genética , Anotação de Sequência Molecular , Animais , Pareamento de Bases/genética , Sequência de Bases , Loci Gênicos/genética , Genoma/genética , Humanos , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos NOD , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência , Análise de Sequência de DNA
4.
BMC Genomics ; 14: 332, 2013 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-23676093

RESUMO

BACKGROUND: The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems. RESULTS: The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome. CONCLUSIONS: This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig's adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response.


Assuntos
Genômica , Imunidade/genética , Anotação de Sequência Molecular , Suínos/genética , Suínos/imunologia , Animais , Bovinos , Evolução Molecular , Duplicação Gênica , Humanos , Imunoglobulinas/genética , Camundongos , Modelos Moleculares , Conformação Proteica , Receptores de Antígenos de Linfócitos T/genética , Receptores KIR/genética , Seleção Genética , Especificidade da Espécie
5.
Database (Oxford) ; 2013: bat011, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23589541

RESUMO

Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.


Assuntos
Genoma/genética , Gorilla gorilla/genética , Gorilla gorilla/imunologia , Complexo Principal de Histocompatibilidade/genética , Análise de Sequência de DNA , Animais , Sequência de Bases , Mapeamento Cromossômico , Humanos , Família Multigênica/genética , Pan troglodytes/genética
6.
Database (Oxford) ; 2012: bas009, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22434843

RESUMO

Manual annotation of genomic data is extremely valuable to produce an accurate reference gene set but is expensive compared with automatic methods and so has been limited to model organisms. Annotation tools that have been developed at the Wellcome Trust Sanger Institute (WTSI, http://www.sanger.ac.uk/.) are being used to fill that gap, as they can be used remotely and so open up viable community annotation collaborations. We introduce the 'Blessed' annotator and 'Gatekeeper' approach to Community Annotation using the Otterlace/ZMap genome annotation tool. We also describe the strategies adopted for annotation consistency, quality control and viewing of the annotation. DATABASE URL: http://vega.sanger.ac.uk/index.html.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , Software , Animais , Humanos , Camundongos , Interface Usuário-Computador
7.
BMC Genomics ; 10: 606, 2009 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-20003482

RESUMO

BACKGROUND: Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region. RESULTS: The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI), and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC) to establish a standardized naming scheme for alpha-defensins. CONCLUSIONS: Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo.


Assuntos
Defensinas/genética , Camundongos Endogâmicos C57BL/genética , Família Multigênica , Sequência de Aminoácidos , Animais , Hibridização Genômica Comparativa , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Genoma , Genômica , Humanos , Camundongos , Dados de Sequência Molecular , Isoformas de Proteínas/genética , Pseudogenes , Alinhamento de Sequência , Análise de Sequência de DNA
8.
Immunogenetics ; 60(1): 1-18, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18193213

RESUMO

The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.


Assuntos
Bases de Dados Genéticas , Variação Genética/imunologia , Antígenos HLA/genética , Haplótipos/genética , Terminologia como Assunto , Biologia Computacional/métodos , Biologia Computacional/tendências , Genoma Humano , Humanos
9.
Genome Biol ; 8(8): R168, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17705864

RESUMO

BACKGROUND: We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage. RESULTS: Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs. CONCLUSION: We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS.


Assuntos
Genoma , Sus scrofa/genética , Animais , Sequência de Bases , Cromossomos Artificiais Bacterianos/genética , Cromossomos de Mamíferos/genética , Sequência Conservada , Sistema Enzimático do Citocromo P-450/genética , Ordem dos Genes , Genoma Humano/genética , Humanos , Camundongos , Chaperonas Moleculares/genética , Dados de Sequência Molecular , Proteína Tirosina Fosfatase não Receptora Tipo 1/genética , Análise de Sequência de DNA , Proteínas de Transporte Vesicular/genética
10.
Genome Biol ; 7 Suppl 1: S4.1-9, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16925838

RESUMO

BACKGROUND: The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results. RESULTS: The GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions. CONCLUSION: In total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation.


Assuntos
Biologia Computacional/normas , Genoma Humano , Genômica/normas , Proteínas/genética , Mapeamento Cromossômico , Biologia Computacional/métodos , Etiquetas de Sequências Expressas , Genes , Genômica/métodos , Humanos , Pseudogenes , RNA Mensageiro/análise , Padrões de Referência , Análise de Sequência de DNA , Análise de Sequência de RNA
11.
Genome Res ; 14(10A): 1888-901, 2004 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-15364904

RESUMO

Del(13)Svea36H (Del36H) is a deletion of approximately 20% of mouse chromosome 13 showing conserved synteny with human chromosome 6p22.1-6p22.3/6p25. The human region is lost in some deletion syndromes and is the site of several disease loci. Heterozygous Del36H mice show numerous phenotypes and may model aspects of human genetic disease. We describe 12.7 Mb of finished, annotated sequence from Del36H. Del36H has a higher gene density than the draft mouse genome, reflecting high local densities of three gene families (vomeronasal receptors, serpins, and prolactins) which are greatly expanded relative to human. Transposable elements are concentrated near these gene families. We therefore suggest that their neighborhoods are gene factories, regions of frequent recombination in which gene duplication is more frequent. The gene families show different proportions of pseudogenes, likely reflecting different strengths of purifying selection and/or gene conversion. They are also associated with relatively low simple sequence concentrations, which vary across the region with a periodicity of approximately 5 Mb. Del36H contains numerous evolutionarily conserved regions (ECRs). Many lie in noncoding regions, are detectable in species as distant as Ciona intestinalis, and therefore are candidate regulatory sequences. This analysis will facilitate functional genomic analysis of Del36H and provides insights into mouse genome evolution.


Assuntos
Evolução Molecular , Genoma , Deleção de Sequência , Animais , Camundongos , Família Multigênica
12.
Genome Res ; 12(10): 1611-8, 2002 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-12368254

RESUMO

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.


Assuntos
Disciplinas das Ciências Biológicas/métodos , Biologia Computacional/métodos , Algoritmos , Animais , Disciplinas das Ciências Biológicas/tendências , Biologia Computacional/tendências , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Internet , Sistemas On-Line , Software , Design de Software , Integração de Sistemas
13.
Genome Res ; 12(5): 749-59, 2002 May.
Artigo em Inglês | MEDLINE | ID: mdl-11997341

RESUMO

The stem cell leukemia (SCL) gene encodes a bHLH transcription factor with a pivotal role in hematopoiesis and vasculogenesis and a pattern of expression that is highly conserved between mammals and zebrafish. Here we report the isolation and characterization of the zebrafish SCL locus together with the identification of three neighboring genes, IER5, MAP17, and MUPP1. This region spans 68 kb and comprises the longest zebrafish genomic sequence currently available for comparison with mammalian, chicken, and pufferfish sequences. Our data show conserved synteny between zebrafish and mammalian SCL and MAP17 loci, thus suggesting the likely genomic domain necessary for the conserved pattern of SCL expression. Long-range comparative sequence analysis/phylogenetic footprinting was used to identify noncoding conserved sequences representing candidate transcriptional regulatory elements. The SCL promoter/enhancer, exon 1, and the poly(A) region were highly conserved, but no homology to other known mouse SCL enhancers was detected in the zebrafish sequence. A combined homology/structure analysis of the poly(A) region predicted consistent structural features, suggesting a conserved functional role in mRNA regulation. Analysis of the SCL promoter/enhancer revealed five motifs, which were conserved from zebrafish to mammals, and each of which is essential for the appropriate pattern or level of SCL transcription.


Assuntos
Proteínas de Ligação a DNA/genética , Regulação Neoplásica da Expressão Gênica/genética , Leucemia-Linfoma de Células T do Adulto/genética , Proteínas Proto-Oncogênicas , Fatores de Transcrição/genética , Proteínas de Peixe-Zebra , Regiões 5' não Traduzidas/genética , Sequência de Aminoácidos , Animais , Fatores de Transcrição Hélice-Alça-Hélice Básicos , Linhagem Celular , Galinhas , Cromossomos Artificiais de Bacteriófago P1/genética , Clonagem Molecular , Sequência Conservada , Proteínas de Ligação a DNA/biossíntese , Proteínas de Ligação a DNA/metabolismo , Éxons/genética , Marcadores Genéticos/genética , Marcadores Genéticos/fisiologia , Humanos , Camundongos , Camundongos Transgênicos , Dados de Sequência Molecular , Poli A/metabolismo , Regiões Promotoras Genéticas/genética , Ratos , Homologia de Sequência do Ácido Nucleico , Proteína 1 de Leucemia Linfocítica Aguda de Células T , Tetraodontiformes , Fatores de Transcrição/biossíntese , Fatores de Transcrição/química , Peixe-Zebra/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...